Smart Mobility Analysis
Focus Group: Transport and Mobility
- Authored by: Chathu Siriwardena
- Duration: 10 Weeks
- Level: Intermediate
- Pre-requisite Skills: Python, Data Wrangling, Data Visualisation, Data Modeling, Machine Learning, Deep Learning, Geographical Coordinates Handling
This use case focuses on integrating pedestrian traffic, tree canopy coverage, and weather conditions to improve urban mobility, promote sustainable commuting, and enhance city planning. By leveraging data-driven insights, the City of Melbourne can design walkable, climate-resilient, and commuter-friendly urban spaces.
User Story¶
As a commuter, I want to find shaded and pedestrian-friendly routes so that I can walk comfortably, even during extreme weather.
As a city planner, I want to identify heat-prone pedestrian areas so that we can prioritise tree planting and optimize transport connectivity.
As a business owner, I want to understand foot traffic patterns near my store so that I can adjust my operations based on customer movement trends.
At the end of this use case you will:¶
- Learn to import data sets using API v2.1
- Gain proficiency in merging multiple datasets to create a comprehensive view.
- Learn data visualisation using matplotlib and seaborn
- Understand geospatial analysis by working with geolocations using libraries like Geopy and Folium to map pedestrian routes, transport networks, and tree canopy coverage.
- Develop a "Cool Routes" scoring model that combines tree canopy data, pedestrian counts, and weather conditions to identify optimal walking paths for heat resilience.
- Develop regression models and basic Feed-Forward Neural Networks (FFNN) to predict foot traffic demand based on weather conditions, and tree canopy avilability for sustainable and efficient commuting.
- Evaluate the impact of the Green Commute on pedestrian satisfaction and heat stress reduction
Data Sets Used:¶
Data Set 1. Pedestrian Counting System
This data set contains ID,Location ID, Base Sensing Date, Hour Day, Direction 1, Direction 2, Pedestrain Count, Sensor Name and Location. The data set was used to identify the movements of pedestrains around city area. The dataset is imported from Melbourne Open Data website, using API V2.
Data Set 2. Tree Canopies Data
This data set contains geo_point_2d, geo_shape, objectid, shape_leng, shape_area. The datahset contains tree canopy within City of Melbourne mapped using 2018 aerial photos and LiDARd city area. The dataset is imported from Melbourne Open Data website, u.sing API V2
Data Set 3. Bus Stop Data
This data set contains geo_point_2d, geo_shape, objectid, addresspt1, addressp_1, asset_clas, asset_type, objectid, str_id, addresspt, asset_subt, model_desc, mcc_id, roadseg_id, description, model_no. This data set shows the locations of the bus stops within the city of Melbourne. The dataset is imported from Melbourne Open Data website, using API V2.
Data Set 4. City Circle Tram Stops Data
This data set contains geo_point_2d, geo_shape, name, xorg, stop_no, mccid_str, xsource, xdate, mccid_int. The data set contains the city circle tram service data within Melbourne city. The dataset is imported from Melbourne Open Data website, using API V2.
Data Set 5. Microclimate Sensors Data
This data set contains device_id, received_at, sensorlocation, latlong, minimumwinddirection, averagewinddirection, maximumwinddirection, minimumwindspeed, averagewindspeed, gustwindspeed, airtemperature, relativehumidity, atmosphericpressure, pm25, pm10, noise. The data set contains This dataset contains climate readings from climate sensors located within the City. The dataset is imported from Melbourne Open Data website, using API V2.
Outline of the Use Case¶
Data Preprocessing
I started use case by cleaning and preparing each dataset for analysis. This involves handling missing values and duplicates: Remove or impute missing values in latitude, longitude, and other critical fields.
Data Visualisation
- Interactive Maps: Used tools like Folium to create an interactive map showing the disribution of foot traffic, canopy coverage and distribution of public transport stops.
- Bar Charts, Stack bar charts, Pie charts, multiple bar charts and other graphs and tables to identify the key insights
- Feature Engineering
Next, I created features that will help the model understand the relationship between pedestrian traffic, tree canopy coverage, and weather conditions
Weather Index Calculation: This index measures how comfortable the weather is for walking based on temperature, wind speed, and humidity.
Canopy Coverage Ratio : This is a measure of how much of a pedestrian area is covered by tree canopy. It is calculated as the proportion of pedestrian observations that fall within areas shaded by tree canopies.
Stress Index Calculation: This index estimates pedestrian stress, increasing when foot traffic is high and tree canopy coverage is low.
Walkability Score Calculation: This score combines pedestrian activity, tree canopy, low stress levels, and favorable weather into a single value to assess how walkable an area is. Higher scores indicate better walking environments.
- Model Selection and Model Building
- Geospatial Clustering Model (Density-based):
- DBSCAN: The DBSCAN clustering applied by grouping street segments that share similar characteristics such as pedestrian activity, weather comfort, canopy coverage, and environmental stress. Clustering transforms raw, multidimensional urban data into actionable intelligence, supporting data driven decisions for improving pedestrian experiences, reducing heat stress, and enhancing city livability.
- K-means: Clustered areas to analyse the distribution of walkability score for different clusters.
- Regression Model:
- Multiple Linear Regression/GLM: Predicted the walkability score based on features like weather index, stress index .
- Logistic Regression: A binary model created to classify whether the walkability score is sufficient or insufficient based on the input features.
- Random Forests/Gradient Boosting:
For more complex relationships, Random Forest or Gradient Boosting models used to predict walkability sufficiency based on multiple features, including geospatial ones.
- Deep Learning Approach for Predicting walkability with Custom Metrics (FFNN)
- Model Evaluation Metrics
Evaluated the model using below metrics:
- Mean Absolute Error (MAE) / Mean Squared Error (MSE): For regression models to predict sufficient walkability.
- Clustering metrics: Evaluated the density-based clusters using silhouette score.
- Classification metrics: For logistic regression, use accuracy, precision, recall, and F1-score to assess the sufficiency of walkability.
Outputs
Walkability Score and Catogerisation : Developed a scoring system to rank based on walkability score for each area within the city of melbourne.
Importing Required Libraries¶
The below code imports a range of libraries essential for data analysis, visualisation, mapping, and interactivity. requests is used to fetch data from APIs, while pandas and numpy support data manipulation and numerical operations. StringIO helps handle in-memory text data, such as loading CSVs from strings. For geolocation tasks, geopy and its Nominatim geocoder are used to convert place names into coordinates. folium enables the creation of interactive maps, and ipywidgets along with IPython.display allows for interactive elements within a Jupyter Notebook. Visualisation is handled by seaborn and matplotlib.pyplot, with Patch from matplotlib.patches used for custom legends or shapes in plots, and the datetime module is included for working with date and time data, which is often crucial in temporal analysis.
import requests
import pandas as pd
import numpy as np
from io import StringIO
import datetime
import geopy
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import folium
from ipywidgets import interact, widgets
from IPython.display import display
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import re as re
import matplotlib.colors as mcolors
from geopy.extra.rate_limiter import RateLimiter
import seaborn as sns
from geopy.distance import geodesic
from sklearn.cluster import KMeans,DBSCAN
from sklearn.preprocessing import MinMaxScaler,StandardScaler, PowerTransformer
import geopandas as gpd
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, precision_score, recall_score, f1_score, roc_curve, auc, silhouette_score,confusion_matrix, ConfusionMatrixDisplay, auc
from sklearn.ensemble import RandomForestRegressor
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, BatchNormalization, LeakyReLU
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras import backend as K
from IPython.display import display, clear_output
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import json
from shapely.geometry import shape
from shapely.geometry import Point
from folium.plugins import HeatMap
Loading all Data sets¶
base_url='https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id='pedestrian-counting-system-monthly-counts-per-hour'
url=f'{base_url}{dataset_id}/exports/csv'
params={'select':'*','limit':-1,'lang':'en','timezone':'UTC'}
response=requests.get(url,params=params)
if response.status_code==200:
url_content=response.content.decode('utf-8')
pedestrian_df=pd.read_csv(StringIO(url_content),delimiter=';')
print(pedestrian_df.head(10))
else:
print(f'Request failed with status code {response.status_code}')
id location_id sensing_date hourday direction_1 direction_2 \ 0 72120220515 72 2022-05-15 1 104 172 1 471720240917 47 2024-09-17 17 1273 894 2 172320211101 17 2021-11-01 23 8 6 3 171820230726 17 2023-07-26 18 267 383 4 24820250405 24 2025-04-05 8 213 218 5 54320240224 54 2024-02-24 3 13 5 6 50420250303 50 2025-03-03 4 1 0 7 143020250508 143 2025-05-08 0 52 19 8 21720221210 2 2022-12-10 17 1671 1129 9 391320231018 39 2023-10-18 13 204 203 pedestriancount sensor_name location 0 276 ACMI_T -37.81726338, 144.96872809 1 2167 Eli250_T -37.81258467, 144.9625775 2 14 Col15_T -37.81362543, 144.97323591 3 650 Col15_T -37.81362543, 144.97323591 4 431 Col620_T -37.81887963, 144.95449198 5 18 Swa607_T -37.804024, 144.96308399 6 1 Lyg309_T -37.79808192, 144.96721013 7 71 Spencer_T -37.821728, 144.95557015 8 2800 Bou283_T -37.81380668, 144.96516718 9 407 AlfPl_T -37.81379749, 144.96995745
Data Set 2: Tree Canopies Data.
base_url='https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id='tree-canopies-public-realm-2018-urban-forest'
url=f'{base_url}{dataset_id}/exports/csv'
params={'select':'*','limit':-1,'lang':'en','timezone':'UTC'}
response=requests.get(url,params=params)
if response.status_code==200:
url_content=response.content.decode('utf-8')
tree_canopies_df=pd.read_csv(StringIO(url_content),delimiter=';')
print(tree_canopies_df.head(10))
else:
print(f'Request failed with status code {response.status_code}')
geo_point_2d \
0 -37.81304517121492, 144.98612858745977
1 -37.813031352270215, 144.98264073647684
2 -37.81261020314892, 144.96112288812233
3 -37.81219284514014, 144.93846977801448
4 -37.81239953857732, 144.95122560445583
5 -37.813040580695024, 144.98654806873841
6 -37.81231922742188, 144.9447777601162
7 -37.81218994603368, 144.94262980622725
8 -37.81245033141797, 144.98815520131134
9 -37.81244314561024, 144.9495590639178
geo_shape objectid shape_leng \
0 {"coordinates": [[[[144.98613240697972, -37.81... 10373 2.692370
1 {"coordinates": [[[[144.98267255431483, -37.81... 10379 55.155123
2 {"coordinates": [[[[144.96112403835852, -37.81... 10380 6.279844
3 {"coordinates": [[[[144.93847665550007, -37.81... 10399 7.048844
4 {"coordinates": [[[[144.95122528646937, -37.81... 10400 2.794252
5 {"coordinates": [[[[144.98655398642268, -37.81... 10385 4.334477
6 {"coordinates": [[[[144.94478614151308, -37.81... 10387 8.128402
7 {"coordinates": [[[[144.9426325334389, -37.812... 10438 7.923251
8 {"coordinates": [[[[144.98817843816417, -37.81... 10669 10.680974
9 {"coordinates": [[[[144.94963279484296, -37.81... 10393 60.743893
shape_area
0 0.488406
1 125.461002
2 2.816221
3 3.643475
4 0.612298
5 1.348686
6 4.911725
7 4.257095
8 4.845412
9 165.844186
Data Set 3: Bus Stops Data.
base_url='https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id='bus-stops'
url=f'{base_url}{dataset_id}/exports/csv'
params={'select':'*','limit':-1,'lang':'en','timezone':'UTC'}
response=requests.get(url,params=params)
if response.status_code==200:
url_content=response.content.decode('utf-8')
bus_stops_df=pd.read_csv(StringIO(url_content),delimiter=';')
print(bus_stops_df.head(10))
else:
print(f'Request failed with status code {response.status_code}')
geo_point_2d \
0 -37.80384165792465, 144.93239283833262
1 -37.81548699581418, 144.9581794249902
2 -37.81353897396532, 144.95728334230756
3 -37.82191394843844, 144.95539345270072
4 -37.83316401267591, 144.97443745130263
5 -37.79436108568101, 144.92998424529242
6 -37.817452093555325, 144.96168480565794
7 -37.82146476463953, 144.9303191551562
8 -37.837547087144706, 144.98191138368836
9 -37.812490976626215, 144.95370614040704
geo_shape prop_id addresspt1 \
0 {"coordinates": [144.93239283833262, -37.80384... 0 76.819824
1 {"coordinates": [144.9581794249902, -37.815486... 0 21.561304
2 {"coordinates": [144.95728334230756, -37.81353... 0 42.177187
3 {"coordinates": [144.95539345270072, -37.82191... 0 15.860434
4 {"coordinates": [144.97443745130263, -37.83316... 0 0.000000
5 {"coordinates": [144.92998424529242, -37.79436... 0 3.105722
6 {"coordinates": [144.96168480565794, -37.81745... 0 7.239726
7 {"coordinates": [144.9303191551562, -37.821464... 0 32.180664
8 {"coordinates": [144.98191138368836, -37.83754... 0 41.441167
9 {"coordinates": [144.95370614040704, -37.81249... 0 16.143764
addressp_1 asset_clas asset_type objectid str_id \
0 357 Signage Sign - Public Transport 355 1235255
1 83 Signage Sign - Public Transport 600 1231226
2 207 Signage Sign - Public Transport 640 1237092
3 181 Signage Sign - Public Transport 918 1232777
4 0 Signage Sign - Public Transport 1029 1271914
5 112 Signage Sign - Public Transport 1139 1577059
6 268 Signage Sign - Public Transport 1263 1481028
7 298 Signage Sign - Public Transport 2527 1245221
8 78 Signage Sign - Public Transport 2922 1248743
9 99 Signage Sign - Public Transport 5111 1253565
addresspt asset_subt model_desc mcc_id \
0 570648 NaN Sign - Public Transport 1 Panel 1235255
1 548056 NaN Sign - Public Transport 1 Panel 1231226
2 543382 NaN Sign - Public Transport 1 Panel 1237092
3 103975 NaN Sign - Public Transport 1 Panel 1232777
4 0 NaN Sign - Public Transport 1 Panel 1271914
5 616011 NaN Sign - Public Transport 1 Panel 1577059
6 527371 NaN Sign - Public Transport 1 Panel 1481028
7 110521 NaN Sign - Public Transport 1 Panel 1245221
8 107419 NaN Sign - Public Transport 1 Panel 1248743
9 602160 NaN Sign - Public Transport 1 Panel 1253565
roadseg_id descriptio model_no
0 21673 Sign - Public Transport 1 Panel Bus Stop Type 13 P.16
1 20184 Sign - Public Transport 1 Panel Bus Stop Type 8 P.16
2 20186 Sign - Public Transport 1 Panel Bus Stop Type 8 P.16
3 22174 Sign - Public Transport 1 Panel Bus Stop Type 8 P.16
4 22708 Sign - Public Transport 1 Panel Bus Stop Type 8 P.16
5 21693 Sign - Public Transport 1 Panel Bus Stop Type 1 P.16
6 20171 Sign - Public Transport 1 Panel Bus Stop Type 3 P.16
7 30638 Sign - Public Transport 1 Panel Bus Stop Type 3 P.16
8 22245 Sign - Public Transport 1 Panel Bus Stop Type 8 P.16
9 20030 Sign - Public Transport 1 Panel Bus Stop Type 8 P.16
Data Set 4: City Circle Tram Stops Data.
base_url='https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id='city-circle-tram-stops'
url=f'{base_url}{dataset_id}/exports/csv'
params={'select':'*','limit':-1,'lang':'en','timezone':'UTC'}
response=requests.get(url,params=params)
if response.status_code==200:
url_content=response.content.decode('utf-8')
tram_stops_df=pd.read_csv(StringIO(url_content),delimiter=';')
print(tram_stops_df.head(10))
else:
print(f'Request failed with status code {response.status_code}')
geo_point_2d \
0 -37.82023778673241, 144.95786314283018
1 -37.82097269970027, 144.95546153614245
2 -37.82190465062153, 144.95109855638137
3 -37.811771476718356, 144.95644059700524
4 -37.81105928060848, 144.95891745116262
5 -37.80961884837298, 144.96384957029932
6 -37.808876998255194, 144.96634474519394
7 -37.81358116790275, 144.97406360491075
8 -37.8176316450406, 144.96690455927876
9 -37.818324403770184, 144.964479208357
geo_shape \
0 {"coordinates": [144.95786314283018, -37.82023...
1 {"coordinates": [144.95546153614245, -37.82097...
2 {"coordinates": [144.95109855638137, -37.82190...
3 {"coordinates": [144.95644059700524, -37.81177...
4 {"coordinates": [144.95891745116262, -37.81105...
5 {"coordinates": [144.96384957029932, -37.80961...
6 {"coordinates": [144.96634474519394, -37.80887...
7 {"coordinates": [144.97406360491075, -37.81358...
8 {"coordinates": [144.96690455927876, -37.81763...
9 {"coordinates": [144.964479208357, -37.8183244...
name xorg stop_no mccid_str xsource \
0 Melbourne Aquarium / Flinders Street GIS Team 2 NaN Mapbase
1 Spencer Street / Flinders Street GIS Team 1 NaN Mapbase
2 The Goods Shed / Wurundjeri Way GIS Team D5 NaN Mapbase
3 William Street / La Trobe Street GIS Team 3 NaN Mapbase
4 Queen Street / La Trobe Street GIS Team 4 NaN Mapbase
5 Swanston Street / La Trobe Street GIS Team 6 NaN Mapbase
6 Russell Street / La Trobe Street GIS Team 7 NaN Mapbase
7 Parliament / Collins Street GIS Team 8 NaN Mapbase
8 Swanston Street / Flinders Street GIS Team 5 NaN Mapbase
9 Elizabeth Street / Flinders Street GIS Team 4 NaN Mapbase
xdate mccid_int
0 2011-10-18 4
1 2011-10-18 5
2 2011-10-18 7
3 2011-10-18 16
4 2011-10-18 17
5 2011-10-18 19
6 2011-10-18 20
7 2011-10-18 25
8 2011-10-18 1
9 2011-10-18 2
Data Set 5: Micro Climate Data.
base_url='https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id='microclimate-sensors-data'
url=f'{base_url}{dataset_id}/exports/csv'
params={'select':'*','limit':-1,'lang':'en','timezone':'UTC'}
response=requests.get(url,params=params)
if response.status_code==200:
url_content=response.content.decode('utf-8')
climate_df=pd.read_csv(StringIO(url_content),delimiter=';')
print(climate_df.head(10))
else:
print(f'Request failed with status code {response.status_code}')
device_id received_at \
0 ICTMicroclimate-08 2025-02-09T00:54:37+00:00
1 ICTMicroclimate-11 2025-02-09T01:02:11+00:00
2 ICTMicroclimate-05 2025-02-09T01:03:24+00:00
3 ICTMicroclimate-01 2025-02-09T01:02:43+00:00
4 ICTMicroclimate-09 2025-02-09T01:17:37+00:00
5 ICTMicroclimate-05 2025-02-09T01:18:26+00:00
6 ICTMicroclimate-02 2025-02-09T01:26:51+00:00
7 ICTMicroclimate-07 2025-02-09T01:35:39+00:00
8 ICTMicroclimate-01 2025-02-09T01:32:44+00:00
9 ICTMicroclimate-04 2025-02-09T01:38:22+00:00
sensorlocation \
0 Swanston St - Tram Stop 13 adjacent Federation...
1 1 Treasury Place
2 Enterprize Park - Pole ID: COM1667
3 Birrarung Marr Park - Pole 1131
4 SkyFarm (Jeff's Shed). Rooftop - Melbourne Con...
5 Enterprize Park - Pole ID: COM1667
6 101 Collins St L11 Rooftop
7 Tram Stop 7C - Melbourne Tennis Centre Precinc...
8 Birrarung Marr Park - Pole 1131
9 Batman Park
latlong minimumwinddirection averagewinddirection \
0 -37.8184515, 144.9678474 0.0 153.0
1 -37.812888, 144.9750857 0.0 144.0
2 -37.8204083, 144.9591192 0.0 45.0
3 -37.8185931, 144.9716404 NaN 150.0
4 -37.8223306, 144.9521696 0.0 241.0
5 -37.8204083, 144.9591192 0.0 357.0
6 -37.814604, 144.9702991 0.0 357.0
7 -37.8222341, 144.9829409 0.0 91.0
8 -37.8185931, 144.9716404 NaN 143.0
9 -37.8221828, 144.9562225 0.0 10.0
maximumwinddirection minimumwindspeed averagewindspeed gustwindspeed \
0 358.0 0.0 3.9 7.9
1 356.0 0.0 2.0 7.8
2 133.0 0.0 1.5 2.7
3 NaN NaN 1.6 NaN
4 359.0 0.0 0.9 4.4
5 32.0 1.6 1.9 2.2
6 359.0 0.0 0.5 1.4
7 356.0 0.0 0.9 4.4
8 NaN NaN 1.9 NaN
9 356.0 0.0 1.5 6.9
airtemperature relativehumidity atmosphericpressure pm25 pm10 \
0 23.9 57.300000 1009.7 0.0 0.0
1 24.5 56.200000 1005.3 0.0 0.0
2 25.0 60.000000 1009.6 1.0 3.0
3 23.1 61.099998 1009.0 0.0 5.0
4 25.6 53.700000 1007.9 0.0 0.0
5 24.5 58.700000 1009.3 1.0 3.0
6 26.6 51.800000 1004.7 1.0 3.0
7 26.6 49.200000 1011.3 0.0 0.0
8 23.9 59.599998 1008.5 0.0 5.0
9 26.5 51.800000 1011.9 0.0 0.0
noise
0 80.500000
1 62.900000
2 68.500000
3 51.700001
4 60.200000
5 68.700000
6 69.200000
7 64.500000
8 53.200001
9 72.300000
Pedestrian Counting System Data.¶
I performed different data cleaning methods.
- Droped 'id', 'location_id', 'direction-1','direction-2' , 'location'and 'sensor_name' columns and renamed 'Sensing_date' as 'Date'.
- Created 'latitude' and 'longitude' columns from 'location' column
pedestrian_df.head(10)
| id | location_id | sensing_date | hourday | direction_1 | direction_2 | pedestriancount | sensor_name | location | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 6220230214 | 6 | 2023-02-14 | 2 | 13 | 21 | 34 | FliS_T | -37.81911705, 144.96558255 |
| 1 | 85720220127 | 85 | 2022-01-27 | 7 | 25 | 14 | 39 | 488Mac_T | -37.79432415, 144.92973378 |
| 2 | 8020250505 | 8 | 2025-05-05 | 0 | 2 | 0 | 2 | WebBN_T | -37.82293543, 144.9471751 |
| 3 | 5420240510 | 5 | 2024-05-10 | 4 | 6 | 1 | 7 | PriNW_T | -37.81874249, 144.96787656 |
| 4 | 25620240110 | 25 | 2024-01-10 | 6 | 91 | 106 | 197 | MCEC_T | -37.82401776, 144.95604426 |
| 5 | 671520220214 | 67 | 2022-02-14 | 15 | 198 | 224 | 422 | FLDegS_T | -37.81688755, 144.96562569 |
| 6 | 1611720241226 | 161 | 2024-12-26 | 17 | 54 | 868 | 922 | BirArt1109_T | -37.81851276, 144.97131336 |
| 7 | 18020231016 | 18 | 2023-10-16 | 0 | 0 | 2 | 2 | Col12_T | -37.81344862, 144.97305353 |
| 8 | 871120230710 | 87 | 2023-07-10 | 11 | 67 | 67 | 134 | Errol23_T | -37.80454949, 144.94921863 |
| 9 | 282320221012 | 28 | 2022-10-12 | 23 | 42 | 36 | 78 | VAC_T | -37.82129925, 144.96879309 |
The above output shows the first 10 rows of the pedestrian sensor dataset. Each row represents hourly pedestrian counts recorded by a specific sensor at a particular location and time and provides an initial view of the structure and granularity of the dataset, confirming that it contains both spatial and temporal information, which is essential for further analysis and visualisation.
pedestrian_df.shape
(2305280, 9)
pedestrian_df.nunique()
id 2311436 location_id 98 sensing_date 1413 hourday 24 direction_1 3179 direction_2 3269 pedestriancount 5052 sensor_name 96 location 98 dtype: int64
There are 2,259,125 records and 9 variables (or features) with each row representing a unique observation of pedestrian counts at a specific time and location.
# Split 'location' into 'latitude' and 'longitude'
pedestrian_df[['latitude', 'longitude']] = pedestrian_df['location'].str.split(', ', expand=True)
# Drop 'id', 'location_id', 'direction-1','direction-2' , 'location'and 'sensor_name' columns
pedestrian_df = pedestrian_df.drop(columns=['id', 'location_id', 'direction_1','direction_2', 'location'])
# Rename 'Sensing_date' to 'Date'
pedestrian_df = pedestrian_df.rename(columns={'sensing_date': 'Date'})
pedestrian_df.head(10)
| Date | hourday | pedestriancount | sensor_name | latitude | longitude | |
|---|---|---|---|---|---|---|
| 0 | 2022-05-15 | 1 | 276 | ACMI_T | -37.81726338 | 144.96872809 |
| 1 | 2024-09-17 | 17 | 2167 | Eli250_T | -37.81258467 | 144.9625775 |
| 2 | 2021-11-01 | 23 | 14 | Col15_T | -37.81362543 | 144.97323591 |
| 3 | 2023-07-26 | 18 | 650 | Col15_T | -37.81362543 | 144.97323591 |
| 4 | 2025-04-05 | 8 | 431 | Col620_T | -37.81887963 | 144.95449198 |
| 5 | 2024-02-24 | 3 | 18 | Swa607_T | -37.804024 | 144.96308399 |
| 6 | 2025-03-03 | 4 | 1 | Lyg309_T | -37.79808192 | 144.96721013 |
| 7 | 2025-05-08 | 0 | 71 | Spencer_T | -37.821728 | 144.95557015 |
| 8 | 2022-12-10 | 17 | 2800 | Bou283_T | -37.81380668 | 144.96516718 |
| 9 | 2023-10-18 | 13 | 407 | AlfPl_T | -37.81379749 | 144.96995745 |
The location column contains both latitude and longitude as a single string. Hence that string splits into two separate columns latitude and longitude which makes it easier to work with coordinates for mapping or spatial analysis. Other steps clean and streamline the dataset by removing redundant information, improving clarity, and preparing it for visualisation, filtering, or analysis focused on time and location based pedestrian patterns.
Adding Street Names to the Pedestrian Data set
# Loading Street Names
data = """
latitude longitude Street
-37.811 144.9643 Swanston St
-37.8213 144.9688 St Kilda Rd
-37.8169 144.9656 Flinders Ln
-37.8112 144.9666 Lonsdale St
-37.8127 144.9539 King St
-37.8146 144.9429 Docklands
-37.8189 144.9545 Collins St
-37.8124 144.9655 Swanston St
-37.8191 144.9656 Flinders Walk
-37.8133 144.9668 Bourke St
-37.8198 144.951 Collins St
-37.8083 144.963 A Beckett St
-37.82 144.9687 St Kilda Rd
-37.8165 144.9612 Queen St
-37.8063 144.9587 toria St
-37.8163 144.9709 Flinders St
-37.8141 144.9661 Swanston St
-37.8188 144.9471 Bourke St
-37.8169 144.9656 Flinders Ln
-37.8187 144.9679 Federation Square
-37.8031 144.9491 Queensberry St
-37.8196 144.9633 Flinders Walk
-37.82 144.9598 St Kilda Rd
-37.8077 144.9631 Swanston St
-37.8127 144.9679 King St
-37.7945 144.9304 Macaulay Rd
-37.8169 144.9536 Flinders Ln
-37.818 144.965 Flinders St
-37.8134 144.9731 Collins St
-37.7981 144.9672 Lygon St
-37.8156 144.9397 Docklands
-37.8046 144.9495 Errol St
-37.8179 144.9662 Flinders St
-37.8173 144.9687 Russell St
-37.813 144.9516 La Trobe St
-37.8205 144.9413 Bourke St
-37.81 144.9622 La Trobe St
-37.8144 144.9443 Harbour Esplanade
-37.8061 144.9564 toria St
-37.8126 144.9626 Elizabeth St
-37.8136 144.9732 Spring Street
-37.8017 144.9666 Lygon St
-37.7984 144.9641 Monash Rd
-37.8229 144.9472 Navigation Dr
-37.8217 144.9556 Rebecca Walk
-37.8124 144.9714 Swanston St
-37.8045 144.9492 Errol St
-37.8153 144.9523 Lonsdale St
-37.8201 144.9576 King St
-37.8001 144.9639 Swanston St
-37.81 144.9723 La Trobe St
-37.8197 144.968 Arts Centre Melbourne
-37.804 144.9631 Swanston St
-37.8106 144.9644 Little Lonsdale St
-37.8168 144.9656 Flinders Ln
-37.8157 144.9668 Swanston St
-37.8024 144.9616 Pelham St
-37.8202 144.9651 Southgate Ave
-37.8173 144.9532 Federation Square
-37.8125 144.9619 Lonsdale St
-37.8152 144.9747 Flinders St
-37.8167 144.9669 Swanston St
-37.8084 144.9591 Franklin St
-37.8011 144.967 Lygon St
-37.824 144.956 Convention Centre Pl
-37.813 144.9516 La Trobe St
-37.8138 144.9652 Bourke St
-37.8147 144.9447 Docklands
-37.8123 144.9615 Lonsdale St
-37.8169 144.953 Flinders Ln
-37.8156 144.9655 Docklands
-37.8073 144.9596 Elizabeth St
-37.8201 144.9629 King St
-37.813 144.9568 La Trobe St
-37.8189 144.9461 Collins St
-37.8135 144.9652 Bourke St
-37.8117 144.9682 Little Bourke St
-37.8184 144.9736 Batman Ave
-37.8149 144.9661 Swanston St
-37.8119 144.9562 Flagstaff Station
-37.8138 144.97 Bourke St
-37.8259 144.9619 Balston St
-37.8163 144.9709 Flinders St
-37.8074 144.9599 Elizabeth St
-37.8191 144.9545 Flinders Walk
-37.809 144.9493 Spencer St
-37.795 144.9353 Macaulay Rd
-37.8185 144.9713 Princes Walk
-37.8125 144.9569 Lonsdale St
-37.7943 144.9297 Macaulay Rd
-37.797 144.9644 Elgin St
-37.8239 144.963 Power St
-37.8175 144.9733 Batman Ave
-37.82 144.9583 St Kilda Rd
-37.8163 144.9555 Flinders St
-37.8176 144.9733 Batman Ave
-37.8095 144.9494 State Route
-37.8101 144.9614 La Trobe St
"""
# Load into DataFrame
street_df = pd.read_csv(StringIO(data), sep="\t")
# Displaying Data Frame
street_df.head()
| latitude | longitude | Street | |
|---|---|---|---|
| 0 | -37.8110 | 144.9643 | Swanston St |
| 1 | -37.8213 | 144.9688 | St Kilda Rd |
| 2 | -37.8169 | 144.9656 | Flinders Ln |
| 3 | -37.8112 | 144.9666 | Lonsdale St |
| 4 | -37.8127 | 144.9539 | King St |
From the street names data set the street names were mapped to the pedestrian data set.
# Change latitude and longitude to numeric values
pedestrian_df['latitude'] = pd.to_numeric(pedestrian_df['latitude'], errors='coerce')
pedestrian_df['longitude'] = pd.to_numeric(pedestrian_df['longitude'], errors='coerce')
street_df['latitude'] = pd.to_numeric(street_df['latitude'], errors='coerce')
street_df['longitude'] = pd.to_numeric(street_df['longitude'], errors='coerce')
# Dropping rows with missing coordinate values
pedestrian_df = pedestrian_df.dropna(subset=['latitude', 'longitude'])
street_df = street_df.dropna(subset=['latitude', 'longitude'])
# Rounding off coordinates
pedestrian_df['lat_round'] = pedestrian_df['latitude'].round(3)
pedestrian_df['lon_round'] = pedestrian_df['longitude'].round(3)
street_df['lat_round'] = street_df['latitude'].round(3)
street_df['lon_round'] = street_df['longitude'].round(3)
# Merge pedestrian data set with rounded coordinates
pedestrian_df_N = pd.merge(
pedestrian_df,
street_df[['lat_round', 'lon_round', 'Street']],
on=['lat_round', 'lon_round'],
how='left'
)
pedestrian_df_N = pedestrian_df_N.drop(columns=['lat_round', 'lon_round'])
# Displaying result
pedestrian_df_N.head()
| Date | hourday | pedestriancount | sensor_name | latitude | longitude | Street | |
|---|---|---|---|---|---|---|---|
| 0 | 2022-05-15 | 1 | 276 | ACMI_T | -37.817263 | 144.968728 | Russell St |
| 1 | 2024-09-17 | 17 | 2167 | Eli250_T | -37.812585 | 144.962578 | Elizabeth St |
| 2 | 2021-11-01 | 23 | 14 | Col15_T | -37.813625 | 144.973236 | Spring Street |
| 3 | 2023-07-26 | 18 | 650 | Col15_T | -37.813625 | 144.973236 | Spring Street |
| 4 | 2025-04-05 | 8 | 431 | Col620_T | -37.818880 | 144.954492 | Collins St |
The street names were mapped to the pedestrian data set
Tree Canopies Data.¶
I performed different data cleaning methods.
- Droped 'geo_shape', 'geo_point_2d' and 'objectid' columns.
- Created 'latitude' and 'longitude' columns from 'geo_point_2d' column
tree_canopies_df.head(10)
| geo_point_2d | geo_shape | objectid | shape_leng | shape_area | |
|---|---|---|---|---|---|
| 0 | -37.81304517121492, 144.98612858745977 | {"coordinates": [[[[144.98613240697972, -37.81... | 10373 | 2.692370 | 0.488406 |
| 1 | -37.813031352270215, 144.98264073647684 | {"coordinates": [[[[144.98267255431483, -37.81... | 10379 | 55.155123 | 125.461002 |
| 2 | -37.81261020314892, 144.96112288812233 | {"coordinates": [[[[144.96112403835852, -37.81... | 10380 | 6.279844 | 2.816221 |
| 3 | -37.81219284514014, 144.93846977801448 | {"coordinates": [[[[144.93847665550007, -37.81... | 10399 | 7.048844 | 3.643475 |
| 4 | -37.81239953857732, 144.95122560445583 | {"coordinates": [[[[144.95122528646937, -37.81... | 10400 | 2.794252 | 0.612298 |
| 5 | -37.813040580695024, 144.98654806873841 | {"coordinates": [[[[144.98655398642268, -37.81... | 10385 | 4.334477 | 1.348686 |
| 6 | -37.81231922742188, 144.9447777601162 | {"coordinates": [[[[144.94478614151308, -37.81... | 10387 | 8.128402 | 4.911725 |
| 7 | -37.81218994603368, 144.94262980622725 | {"coordinates": [[[[144.9426325334389, -37.812... | 10438 | 7.923251 | 4.257095 |
| 8 | -37.81245033141797, 144.98815520131134 | {"coordinates": [[[[144.98817843816417, -37.81... | 10669 | 10.680974 | 4.845412 |
| 9 | -37.81244314561024, 144.9495590639178 | {"coordinates": [[[[144.94963279484296, -37.81... | 10393 | 60.743893 | 165.844186 |
The above table displays the first 10 entries of the tree canopy dataset, which provides spatial data on tree canopy coverage in Melbourne. Each row corresponds to a specific tree canopy polygon. This dataset is essential for spatial analysis and visualisation of tree cover in relation to other urban features like pedestrian movement, public transport access, or heat mapping.
tree_canopies_df.shape
(32787, 5)
There are 32,787 individual tree canopy records, each representing a unique canopy area in the city and the dataset includes five attributes per record.
tree_canopies_df.nunique()
geo_point_2d 32787 geo_shape 32785 objectid 32787 shape_leng 32737 shape_area 32740 dtype: int64
# Split 'geo_point_2d' into 'latitude' and 'longitude'
tree_canopies_df[['latitude', 'longitude']] = tree_canopies_df['geo_point_2d'].str.split(', ', expand=True)
# Drop 'geo_shape', 'geo_point_2d' and 'objectid' columns
tree_canopies_df = tree_canopies_df.drop(columns=['objectid', 'geo_point_2d'])
tree_canopies_df.head(10)
| geo_shape | shape_leng | shape_area | latitude | longitude | |
|---|---|---|---|---|---|
| 0 | {"coordinates": [[[[144.98613240697972, -37.81... | 2.692370 | 0.488406 | -37.81304517121492 | 144.98612858745977 |
| 1 | {"coordinates": [[[[144.98267255431483, -37.81... | 55.155123 | 125.461002 | -37.813031352270215 | 144.98264073647684 |
| 2 | {"coordinates": [[[[144.96112403835852, -37.81... | 6.279844 | 2.816221 | -37.81261020314892 | 144.96112288812233 |
| 3 | {"coordinates": [[[[144.93847665550007, -37.81... | 7.048844 | 3.643475 | -37.81219284514014 | 144.93846977801448 |
| 4 | {"coordinates": [[[[144.95122528646937, -37.81... | 2.794252 | 0.612298 | -37.81239953857732 | 144.95122560445583 |
| 5 | {"coordinates": [[[[144.98655398642268, -37.81... | 4.334477 | 1.348686 | -37.813040580695024 | 144.98654806873841 |
| 6 | {"coordinates": [[[[144.94478614151308, -37.81... | 8.128402 | 4.911725 | -37.81231922742188 | 144.9447777601162 |
| 7 | {"coordinates": [[[[144.9426325334389, -37.812... | 7.923251 | 4.257095 | -37.81218994603368 | 144.94262980622725 |
| 8 | {"coordinates": [[[[144.98817843816417, -37.81... | 10.680974 | 4.845412 | -37.81245033141797 | 144.98815520131134 |
| 9 | {"coordinates": [[[[144.94963279484296, -37.81... | 60.743893 | 165.844186 | -37.81244314561024 | 144.9495590639178 |
The geo_point_2d column contains location data as a single string ("latitude, longitude").This line splits it into two separate columns as latitude and longitude, which is more convenient for mapping and spatial joins.
objectid and geo_point_2d columns are removed to clean the dataset
Above steps streamline the dataset by focusing on the essential information such as canopy area, shape length, and geographic coordinates. This makes the data easier to work with for visualisation and spatial analysis.
Bus Stops Data.¶
I performed different data cleaning methods.
- Droped 'geo_shape', 'prop_id', 'geo_point_2d' , 'addresspt1', addressp_1 'asset_clas', 'asset_type', 'objectid','str_id','addresspt','asset_subt','model_desc','mcc_id' ,'roadseg_id', 'descriptio', and 'model_no' columns
- Created 'latitude' and 'longitude' columns from 'geo_point_2d' column.
- Added Stop Type columnn
- Removed Duplicates
bus_stops_df.head(10)
| geo_point_2d | geo_shape | prop_id | addresspt1 | addressp_1 | asset_clas | asset_type | objectid | str_id | addresspt | asset_subt | model_desc | mcc_id | roadseg_id | descriptio | model_no | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -37.80384165792465, 144.93239283833262 | {"coordinates": [144.93239283833262, -37.80384... | 0 | 76.819824 | 357 | Signage | Sign - Public Transport | 355 | 1235255 | 570648 | NaN | Sign - Public Transport 1 Panel | 1235255 | 21673 | Sign - Public Transport 1 Panel Bus Stop Type 13 | P.16 |
| 1 | -37.81548699581418, 144.9581794249902 | {"coordinates": [144.9581794249902, -37.815486... | 0 | 21.561304 | 83 | Signage | Sign - Public Transport | 600 | 1231226 | 548056 | NaN | Sign - Public Transport 1 Panel | 1231226 | 20184 | Sign - Public Transport 1 Panel Bus Stop Type 8 | P.16 |
| 2 | -37.81353897396532, 144.95728334230756 | {"coordinates": [144.95728334230756, -37.81353... | 0 | 42.177187 | 207 | Signage | Sign - Public Transport | 640 | 1237092 | 543382 | NaN | Sign - Public Transport 1 Panel | 1237092 | 20186 | Sign - Public Transport 1 Panel Bus Stop Type 8 | P.16 |
| 3 | -37.82191394843844, 144.95539345270072 | {"coordinates": [144.95539345270072, -37.82191... | 0 | 15.860434 | 181 | Signage | Sign - Public Transport | 918 | 1232777 | 103975 | NaN | Sign - Public Transport 1 Panel | 1232777 | 22174 | Sign - Public Transport 1 Panel Bus Stop Type 8 | P.16 |
| 4 | -37.83316401267591, 144.97443745130263 | {"coordinates": [144.97443745130263, -37.83316... | 0 | 0.000000 | 0 | Signage | Sign - Public Transport | 1029 | 1271914 | 0 | NaN | Sign - Public Transport 1 Panel | 1271914 | 22708 | Sign - Public Transport 1 Panel Bus Stop Type 8 | P.16 |
| 5 | -37.79436108568101, 144.92998424529242 | {"coordinates": [144.92998424529242, -37.79436... | 0 | 3.105722 | 112 | Signage | Sign - Public Transport | 1139 | 1577059 | 616011 | NaN | Sign - Public Transport 1 Panel | 1577059 | 21693 | Sign - Public Transport 1 Panel Bus Stop Type 1 | P.16 |
| 6 | -37.817452093555325, 144.96168480565794 | {"coordinates": [144.96168480565794, -37.81745... | 0 | 7.239726 | 268 | Signage | Sign - Public Transport | 1263 | 1481028 | 527371 | NaN | Sign - Public Transport 1 Panel | 1481028 | 20171 | Sign - Public Transport 1 Panel Bus Stop Type 3 | P.16 |
| 7 | -37.82146476463953, 144.9303191551562 | {"coordinates": [144.9303191551562, -37.821464... | 0 | 32.180664 | 298 | Signage | Sign - Public Transport | 2527 | 1245221 | 110521 | NaN | Sign - Public Transport 1 Panel | 1245221 | 30638 | Sign - Public Transport 1 Panel Bus Stop Type 3 | P.16 |
| 8 | -37.837547087144706, 144.98191138368836 | {"coordinates": [144.98191138368836, -37.83754... | 0 | 41.441167 | 78 | Signage | Sign - Public Transport | 2922 | 1248743 | 107419 | NaN | Sign - Public Transport 1 Panel | 1248743 | 22245 | Sign - Public Transport 1 Panel Bus Stop Type 8 | P.16 |
| 9 | -37.812490976626215, 144.95370614040704 | {"coordinates": [144.95370614040704, -37.81249... | 0 | 16.143764 | 99 | Signage | Sign - Public Transport | 5111 | 1253565 | 602160 | NaN | Sign - Public Transport 1 Panel | 1253565 | 20030 | Sign - Public Transport 1 Panel Bus Stop Type 8 | P.16 |
Above table shows the first 10 rows of the bus stops dataset, which includes geographic and descriptive details about public transport signage (bus stops) in the area. Each row represents one bus stop sign.
bus_stops_df.shape
(309, 16)
The bus_stops_df dataset contains 309 rows and 16 columns, representing detailed information for 309 individual bus stop assets.
bus_stops_df.nunique()
geo_point_2d 295 geo_shape 295 prop_id 6 addresspt1 274 addressp_1 193 asset_clas 1 asset_type 1 objectid 309 str_id 309 addresspt 241 asset_subt 0 model_desc 1 mcc_id 309 roadseg_id 198 descriptio 8 model_no 1 dtype: int64
# Split 'geo_point_2d' into 'latitude' and 'longitude'
bus_stops_df[['latitude', 'longitude']] = bus_stops_df['geo_point_2d'].str.split(', ', expand=True)
# Drop 'geo_shape', 'prop_id', 'geo_point_2d' , 'addresspt1', addressp_1 'asset_clas', 'asset_type', 'objectid', 'str_id','addresspt','asset_subt','model_desc','mcc_id' ,'roadseg_id', 'descriptio', and 'model_no' columns
bus_stops_df = bus_stops_df.drop(columns=['geo_shape', 'geo_point_2d','prop_id', 'addresspt1', 'addressp_1', 'asset_clas', 'asset_type', 'objectid', 'str_id','addresspt','asset_subt','model_desc','mcc_id' ,'roadseg_id', 'descriptio', 'model_no' ])
#Added Stop Type column
bus_stops_df['stop_type'] = 'Bus Stop'
#Remove duplicates
bus_stops_df = bus_stops_df.drop_duplicates(subset=['latitude', 'longitude'])
bus_stops_df.head(10)
| latitude | longitude | stop_type | |
|---|---|---|---|
| 0 | -37.80384165792465 | 144.93239283833262 | Bus Stop |
| 1 | -37.81548699581418 | 144.9581794249902 | Bus Stop |
| 2 | -37.81353897396532 | 144.95728334230756 | Bus Stop |
| 3 | -37.82191394843844 | 144.95539345270072 | Bus Stop |
| 4 | -37.83316401267591 | 144.97443745130263 | Bus Stop |
| 5 | -37.79436108568101 | 144.92998424529242 | Bus Stop |
| 6 | -37.817452093555325 | 144.96168480565794 | Bus Stop |
| 7 | -37.82146476463953 | 144.9303191551562 | Bus Stop |
| 8 | -37.837547087144706 | 144.98191138368836 | Bus Stop |
| 9 | -37.812490976626215 | 144.95370614040704 | Bus Stop |
I splitted the geo_point_2d column into two separate columns: latitude and longitude, making it easier to work with geographic data, and removed columns that are either irrelevant or redundant for the analysis, keeping only the necessary data (like latitude, longitude, and stop_type). A new column, stop_type, is added and set to 'Bus Stop' for all rows, specifying the type of asset. Removed any duplicate bus stops that may have the same coordinates, ensuring each bus stop is unique based on its location.
bus_stops_df.shape
(295, 3)
bus_stops_df.nunique()
latitude 295 longitude 295 stop_type 1 dtype: int64
Tram Stops Data.¶
I performed different data cleaning methods.
- Droped 'geo_shape', 'geo_point_2d','xorg', 'stop_no', 'mccid_str', 'xsource', 'axdate', and 'mccid_int' columns.
- Created 'latitude' and 'longitude' columns from 'geo_point_2d' column.
- Added Stop Type columnn.
tram_stops_df.head(10)
| geo_point_2d | geo_shape | name | xorg | stop_no | mccid_str | xsource | xdate | mccid_int | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | -37.82023778673241, 144.95786314283018 | {"coordinates": [144.95786314283018, -37.82023... | Melbourne Aquarium / Flinders Street | GIS Team | 2 | NaN | Mapbase | 2011-10-18 | 4 |
| 1 | -37.82097269970027, 144.95546153614245 | {"coordinates": [144.95546153614245, -37.82097... | Spencer Street / Flinders Street | GIS Team | 1 | NaN | Mapbase | 2011-10-18 | 5 |
| 2 | -37.82190465062153, 144.95109855638137 | {"coordinates": [144.95109855638137, -37.82190... | The Goods Shed / Wurundjeri Way | GIS Team | D5 | NaN | Mapbase | 2011-10-18 | 7 |
| 3 | -37.811771476718356, 144.95644059700524 | {"coordinates": [144.95644059700524, -37.81177... | William Street / La Trobe Street | GIS Team | 3 | NaN | Mapbase | 2011-10-18 | 16 |
| 4 | -37.81105928060848, 144.95891745116262 | {"coordinates": [144.95891745116262, -37.81105... | Queen Street / La Trobe Street | GIS Team | 4 | NaN | Mapbase | 2011-10-18 | 17 |
| 5 | -37.80961884837298, 144.96384957029932 | {"coordinates": [144.96384957029932, -37.80961... | Swanston Street / La Trobe Street | GIS Team | 6 | NaN | Mapbase | 2011-10-18 | 19 |
| 6 | -37.808876998255194, 144.96634474519394 | {"coordinates": [144.96634474519394, -37.80887... | Russell Street / La Trobe Street | GIS Team | 7 | NaN | Mapbase | 2011-10-18 | 20 |
| 7 | -37.81358116790275, 144.97406360491075 | {"coordinates": [144.97406360491075, -37.81358... | Parliament / Collins Street | GIS Team | 8 | NaN | Mapbase | 2011-10-18 | 25 |
| 8 | -37.8176316450406, 144.96690455927876 | {"coordinates": [144.96690455927876, -37.81763... | Swanston Street / Flinders Street | GIS Team | 5 | NaN | Mapbase | 2011-10-18 | 1 |
| 9 | -37.818324403770184, 144.964479208357 | {"coordinates": [144.964479208357, -37.8183244... | Elizabeth Street / Flinders Street | GIS Team | 4 | NaN | Mapbase | 2011-10-18 | 2 |
The tram_stops_df dataset includes information about tram stop locations in Melbourne. It contains columns like geo_point_2d, which represents the latitude and longitude of each stop, and geo_shape, which holds the geospatial shape data in JSON format. Other columns, such as name, provide the tram stop’s description, while stop_no represents its unique identifier. The dataset also includes metadata like the data source (xsource), collection date (xdate), and identifiers in both string and integer formats (mccid_str and mccid_int). This data can be used for analyzing tram stop locations and other related insights.
tram_stops_df.shape
(28, 9)
The shape of the tram_stops_df dataset is (28, 9), it has 128 rows and 9 columns.
tram_stops_df.nunique()
geo_point_2d 28 geo_shape 28 name 28 xorg 1 stop_no 18 mccid_str 0 xsource 1 xdate 1 mccid_int 28 dtype: int64
# Split 'geo_point_2d' into 'latitude' and 'longitude'
tram_stops_df[['latitude', 'longitude']] = tram_stops_df['geo_point_2d'].str.split(', ', expand=True)
# Drop 'geo_shape', 'geo_point_2d','xorg', 'stop_no', 'mccid_str', 'xsource', 'axdate', and'mccid_int' columns
tram_stops_df = tram_stops_df.drop(columns=['geo_shape', 'geo_point_2d','xorg', 'stop_no', 'mccid_str', 'xsource', 'xdate', 'mccid_int', 'name'])
#Added Stop Type column
tram_stops_df['stop_type'] = 'Tram Stop'
tram_stops_df.head(10)
| latitude | longitude | stop_type | |
|---|---|---|---|
| 0 | -37.82023778673241 | 144.95786314283018 | Tram Stop |
| 1 | -37.82097269970027 | 144.95546153614245 | Tram Stop |
| 2 | -37.82190465062153 | 144.95109855638137 | Tram Stop |
| 3 | -37.811771476718356 | 144.95644059700524 | Tram Stop |
| 4 | -37.81105928060848 | 144.95891745116262 | Tram Stop |
| 5 | -37.80961884837298 | 144.96384957029932 | Tram Stop |
| 6 | -37.808876998255194 | 144.96634474519394 | Tram Stop |
| 7 | -37.81358116790275 | 144.97406360491075 | Tram Stop |
| 8 | -37.8176316450406 | 144.96690455927876 | Tram Stop |
| 9 | -37.818324403770184 | 144.964479208357 | Tram Stop |
The below actions performed to clean the data set
Splitting geo_point_2d into latitude and longitude: This extracts the latitude and longitude values from the geo_point_2d column, which is in the format of a string like "-37.82023778673241, 144.95786314283018", and assigns them to new columns latitude and longitude.
Dropping unnecessary columns: Columns like geo_shape, geo_point_2d, xorg, stop_no, mccid_str, xsource, xdate, and mccid_int are dropped as they are no longer needed for analysis.
A new column called stop_type is added to the dataset, with the value 'Tram Stop' for all rows, indicating that these are tram stop locations.
Climate Data.¶
I performed different data cleaning methods.
- Droped 'device_id', 'received_at','sensorlocation', and 'latlong' columns.
- Created 'latitude' and 'longitude' columns from 'latlong' column.
- Filled Null values.
climate_df.head(10)
| device_id | received_at | sensorlocation | latlong | minimumwinddirection | averagewinddirection | maximumwinddirection | minimumwindspeed | averagewindspeed | gustwindspeed | airtemperature | relativehumidity | atmosphericpressure | pm25 | pm10 | noise | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ICTMicroclimate-08 | 2025-02-09T00:54:37+00:00 | Swanston St - Tram Stop 13 adjacent Federation... | -37.8184515, 144.9678474 | 0.0 | 153.0 | 358.0 | 0.0 | 3.9 | 7.9 | 23.9 | 57.300000 | 1009.7 | 0.0 | 0.0 | 80.500000 |
| 1 | ICTMicroclimate-11 | 2025-02-09T01:02:11+00:00 | 1 Treasury Place | -37.812888, 144.9750857 | 0.0 | 144.0 | 356.0 | 0.0 | 2.0 | 7.8 | 24.5 | 56.200000 | 1005.3 | 0.0 | 0.0 | 62.900000 |
| 2 | ICTMicroclimate-05 | 2025-02-09T01:03:24+00:00 | Enterprize Park - Pole ID: COM1667 | -37.8204083, 144.9591192 | 0.0 | 45.0 | 133.0 | 0.0 | 1.5 | 2.7 | 25.0 | 60.000000 | 1009.6 | 1.0 | 3.0 | 68.500000 |
| 3 | ICTMicroclimate-01 | 2025-02-09T01:02:43+00:00 | Birrarung Marr Park - Pole 1131 | -37.8185931, 144.9716404 | NaN | 150.0 | NaN | NaN | 1.6 | NaN | 23.1 | 61.099998 | 1009.0 | 0.0 | 5.0 | 51.700001 |
| 4 | ICTMicroclimate-09 | 2025-02-09T01:17:37+00:00 | SkyFarm (Jeff's Shed). Rooftop - Melbourne Con... | -37.8223306, 144.9521696 | 0.0 | 241.0 | 359.0 | 0.0 | 0.9 | 4.4 | 25.6 | 53.700000 | 1007.9 | 0.0 | 0.0 | 60.200000 |
| 5 | ICTMicroclimate-05 | 2025-02-09T01:18:26+00:00 | Enterprize Park - Pole ID: COM1667 | -37.8204083, 144.9591192 | 0.0 | 357.0 | 32.0 | 1.6 | 1.9 | 2.2 | 24.5 | 58.700000 | 1009.3 | 1.0 | 3.0 | 68.700000 |
| 6 | ICTMicroclimate-02 | 2025-02-09T01:26:51+00:00 | 101 Collins St L11 Rooftop | -37.814604, 144.9702991 | 0.0 | 357.0 | 359.0 | 0.0 | 0.5 | 1.4 | 26.6 | 51.800000 | 1004.7 | 1.0 | 3.0 | 69.200000 |
| 7 | ICTMicroclimate-07 | 2025-02-09T01:35:39+00:00 | Tram Stop 7C - Melbourne Tennis Centre Precinc... | -37.8222341, 144.9829409 | 0.0 | 91.0 | 356.0 | 0.0 | 0.9 | 4.4 | 26.6 | 49.200000 | 1011.3 | 0.0 | 0.0 | 64.500000 |
| 8 | ICTMicroclimate-01 | 2025-02-09T01:32:44+00:00 | Birrarung Marr Park - Pole 1131 | -37.8185931, 144.9716404 | NaN | 143.0 | NaN | NaN | 1.9 | NaN | 23.9 | 59.599998 | 1008.5 | 0.0 | 5.0 | 53.200001 |
| 9 | ICTMicroclimate-04 | 2025-02-09T01:38:22+00:00 | Batman Park | -37.8221828, 144.9562225 | 0.0 | 10.0 | 356.0 | 0.0 | 1.5 | 6.9 | 26.5 | 51.800000 | 1011.9 | 0.0 | 0.0 | 72.300000 |
The climate_df dataframe contains data from environmental sensors located in different parts of Melbourne, tracking various climate variables. This dataset will be used to analyze environmental conditions in different areas over time.
climate_df.shape
(330961, 16)
The climate_df dataframe has a shape of (294602, 16), meaning it contains 294,602 rows and 16 columns. This indicates that there are 294,602 individual climate measurements across the 16 features recorded by the sensors.
climate_df.nunique()
device_id 12 received_at 331169 sensorlocation 11 latlong 12 minimumwinddirection 360 averagewinddirection 360 maximumwinddirection 361 minimumwindspeed 409 averagewindspeed 102 gustwindspeed 304 airtemperature 543 relativehumidity 1551 atmosphericpressure 1644 pm25 528 pm10 119 noise 1115 dtype: int64
# Changing 'received_at' is in datetime format
climate_df['received_at'] = pd.to_datetime(climate_df['received_at'])
# Creating separate columns for date and time
climate_df['date'] = climate_df['received_at'].dt.date
climate_df['time'] = climate_df['received_at'].dt.time
#Filling Null value
climate_df.fillna(0, inplace=True)
# Splitting 'latlong' into 'latitude' and 'longitude'
climate_df[['latitude', 'longitude']] = climate_df['latlong'].str.split(', ', expand=True)
# Dropping 'device_id', 'received_at','sensorlocation', and 'latlong' columns
climate_df = climate_df.drop(columns=['device_id', 'received_at', 'latlong'])
climate_df['hour'] = pd.to_datetime(climate_df['time'], format='%H:%M:%S').dt.hour
# Printing the results
climate_df.head()
| sensorlocation | minimumwinddirection | averagewinddirection | maximumwinddirection | minimumwindspeed | averagewindspeed | gustwindspeed | airtemperature | relativehumidity | atmosphericpressure | pm25 | pm10 | noise | date | time | latitude | longitude | hour | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Swanston St - Tram Stop 13 adjacent Federation... | 0.0 | 153.0 | 358.0 | 0.0 | 3.9 | 7.9 | 23.9 | 57.300000 | 1009.7 | 0.0 | 0.0 | 80.500000 | 2025-02-09 | 00:54:37 | -37.8184515 | 144.9678474 | 0 |
| 1 | 1 Treasury Place | 0.0 | 144.0 | 356.0 | 0.0 | 2.0 | 7.8 | 24.5 | 56.200000 | 1005.3 | 0.0 | 0.0 | 62.900000 | 2025-02-09 | 01:02:11 | -37.812888 | 144.9750857 | 1 |
| 2 | Enterprize Park - Pole ID: COM1667 | 0.0 | 45.0 | 133.0 | 0.0 | 1.5 | 2.7 | 25.0 | 60.000000 | 1009.6 | 1.0 | 3.0 | 68.500000 | 2025-02-09 | 01:03:24 | -37.8204083 | 144.9591192 | 1 |
| 3 | Birrarung Marr Park - Pole 1131 | 0.0 | 150.0 | 0.0 | 0.0 | 1.6 | 0.0 | 23.1 | 61.099998 | 1009.0 | 0.0 | 5.0 | 51.700001 | 2025-02-09 | 01:02:43 | -37.8185931 | 144.9716404 | 1 |
| 4 | SkyFarm (Jeff's Shed). Rooftop - Melbourne Con... | 0.0 | 241.0 | 359.0 | 0.0 | 0.9 | 4.4 | 25.6 | 53.700000 | 1007.9 | 0.0 | 0.0 | 60.200000 | 2025-02-09 | 01:17:37 | -37.8223306 | 144.9521696 | 1 |
The received_at column was converted to a datetime format and splits it into separate date and time columns, and filled any missing values with 0, then splits the latlong column into latitude and longitude. Unnecessary columns like device_id, received_at, sensorlocation, and latlong were dropped to simplify the dataframe. The result is a clean dataframe with the required information for analysis.
Data Visualisation - Pedestrian Counting System Data.¶
Plotting the Time Series of Pedestrian Count
# Plotting Daily Pedestrian Distribution
pedestrian_df_N.plot(
x='Date',
y='pedestriancount',
figsize=(12, 4),
title='Daily Pedestrian Count',
legend=False,
color="#ffb84d"
)
plt.xlabel("Date")
plt.ylabel("Pedestrian Count")
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
According to above Time series plot
There seems to be an overall increasing trend, suggesting that pedestrian activity has generally risen over time.
Some periodic fluctuations may indicate seasonal effects such as weekends, holidays, or special events.
Some days show extreme spikes in pedestrian traffic.
These could correspond to events, festivals, or special occasions that caused a surge in foot traffic.
Plotting Monthly Pedestrian Count
# Making a copy of the data
mom_df = pedestrian_df_N.copy()
# Convert to datetime
mom_df["Date"] = pd.to_datetime(mom_df["Date"], errors='coerce')
# Extract year and month
mom_df["year_month"] = mom_df["Date"].dt.to_period("M")
mom_df["year"] = mom_df["Date"].dt.year
# Remove current month
current_year_month = datetime.datetime.today().strftime("%Y-%m")
mom_df = mom_df[mom_df["year_month"].astype(str) != current_year_month]
# Sum counts per month
monthly_counts = mom_df.groupby(["year_month", "year"])["pedestriancount"].sum().reset_index()
monthly_counts["year_month"] = monthly_counts["year_month"].astype(str)
# Plotting
plt.figure(figsize=(12, 6))
for year in sorted(monthly_counts["year"].unique()):
year_data = monthly_counts[monthly_counts["year"] == year]
plt.plot(year_data["year_month"], year_data["pedestriancount"],
marker='o', linestyle='-', label=str(year), color="#ffb84d")
plt.xlabel("Month")
plt.ylabel("Total Pedestrian Count")
plt.title("Monthly Pedestrian Count")
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
According to above plot
The pedestrian count generally increases year over year.This suggests growing foot traffic in urban areas, possibly due to economic recovery, better infrastructure, or population growth.
Each year shows periodic rises and falls in pedestrian counts. Possible factors affecting trends:
Weather: Cold months may have lower pedestrian activity.
Events & Holidays: Some peaks may correspond to major city events.
Work & School Cycles: Summer vacations and holiday periods might show variations.
Post-Pandemic Recovery (2021 - 2022). 2021 starts with a low count, likely due to lingering COVID-19 restrictions.2022 shows rapid growth, indicating a return to normal foot traffic levels.
Recent Trends (2024 - 2025). Slight fluctuations in 2024, but pedestrian counts remain relatively high. Early 2025 shows a peak, possibly indicating an ongoing upward trend.
Plotting Hourly Pedestrian Count
# Aggregating pedestrian count by hour of the day
hourly_counts = pedestrian_df_N.groupby("hourday")["pedestriancount"].sum().reset_index()
# Identifying peak hours and off peak hours
top_hours = hourly_counts.nlargest(6, "pedestriancount")["hourday"]
lowest_hours = hourly_counts.nsmallest(7, "pedestriancount")["hourday"]
# Normalizing the counts for color intensity (higher count = darker)
norm = mcolors.Normalize(vmin=hourly_counts["pedestriancount"].min(), vmax=hourly_counts["pedestriancount"].max())
# Definning colors
yellow_shades = [ "#FFF2CC", "#FFEB99", "#FFDD66", "#FFCC33", "#FFB800",]
num_shades = len(yellow_shades)
colors = [yellow_shades[int(norm(count) * (num_shades - 1))] for count in hourly_counts["pedestriancount"]]
# Plotting the bar chart
plt.figure(figsize=(8, 6))
bars = plt.bar(hourly_counts["hourday"], hourly_counts["pedestriancount"], color=colors)
plt.xlabel("Hour of the Day")
plt.ylabel("Total Pedestrian Count")
plt.title("Pedestrian Count by Hour of the Day")
plt.xticks(range(0, 24))
plt.grid(axis='y', linestyle='--')
# Creating the legend
legend_labels = [
Patch(color=yellow_shades[0], label="Low Pedestrian Count"),
Patch(color=yellow_shades[-1], label="High Pedestrian Count")
]
plt.legend(handles=legend_labels, loc='upper right')
plt.tight_layout()
plt.show()
According to above bar chart
Peak Hours (Red Bars) occurs mostly in the afternoon to early evening around 12 PM to 5 PM. This suggests that pedestrian footfall is highest around lunchtime and evening rush hours.
Off-Peak Hours (Green Bars) are seen during the late-night and early morning hours between 12 AM to 6 AM, where pedestrian activity is at its lowest.
Moderate Traffic Hours (Light Orange Bars) show an increasing trend in the morning and then a gradual decline in the evening after peak hours.
Plotting Pedestrians count by Weekday
# Coverting 'Date' column to datetime format
pedestrian_df_N['Date'] = pd.to_datetime(pedestrian_df_N['Date'], errors='coerce')
pedestrian_df_N['Weekday'] = pedestrian_df_N['Date'].dt.day_name()
# Group by weekday and sum pedestrian counts
weekday_counts = pedestrian_df_N.groupby('Weekday')['pedestriancount'].sum()
weekday_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
weekday_counts = weekday_counts.reindex(weekday_order)
# Calculating percentages
weekday_percent = (weekday_counts / weekday_counts.sum() * 100).round(1)
norm = mcolors.Normalize(vmin=weekday_counts.min(), vmax=weekday_counts.max())
# Definning colors
yellow_shades = ["#FFF2CC", "#FFEB99", "#FFDD66", "#FFCC33","#FFB800",]
num_shades = len(yellow_shades)
colors = [yellow_shades[int(norm(count) * (num_shades - 1))] for count in weekday_counts.values]
# Plotting
plt.figure(figsize=(6, 4))
ax = sns.barplot(x=weekday_counts.index, y=weekday_counts.values, palette=colors)
for i, (value, percent) in enumerate(zip(weekday_counts.values, weekday_percent.values)):
ax.text(i, value + 100, f'{percent}%', ha='center', fontsize=9)
plt.ylabel("Total Pedestrian Count")
plt.xlabel("Weekday")
plt.title("Pedestrian Count by Weekday")
plt.tight_layout()
plt.show()
According to above bar chart Fridays and Saturdays are the busiest days and sundays and mondays are the most less busiest days of the weekday.
Plotting Pedestrian Counts by Street
# Group and sort pedestrian counts by street
street_counts = pedestrian_df_N.groupby('Street')['pedestriancount'].sum().sort_values(ascending=False)
# Calculating percentages
total_count = street_counts.sum()
street_percent = (street_counts / total_count * 100).round(1)
# Normalizing
norm = mcolors.Normalize(vmin=street_counts.min(), vmax=street_counts.max())
# Definning colors
yellow_shades = [ "#FFF2CC", "#FFEB99", "#FFDD66", "#FFCC33", "#FFB800",]
num_shades = len(yellow_shades)
colors = [yellow_shades[int(norm(count) * (num_shades - 1))] for count in street_counts.values]
# Plotting the bar chart
plt.figure(figsize=(12, 6))
ax = sns.barplot(x=street_counts.values, y=street_counts.index, palette=colors)
# Add percentage labels
for i, (value, percent) in enumerate(zip(street_counts.values, street_percent.values)):
ax.text(value + 50, i, f'{percent}%', va='center', fontsize=9)
plt.xlabel("Total Pedestrian Count")
plt.ylabel("Street")
plt.title("Total Pedestrian Count by Street")
plt.tight_layout()
plt.show()
The Swanston Street and Flinders Street are the Busiest streets.
Plotting weekday wise Buisest streets.
# Create 'Weekday' column from 'Date'
pedestrian_df_N['Date'] = pd.to_datetime(pedestrian_df_N['Date'])
pedestrian_df_N['Weekday'] = pedestrian_df_N['Date'].dt.day_name()
weekday_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday','Sunday',]
weekday_dtype = pd.CategoricalDtype(categories=weekday_order, ordered=True)
pedestrian_df_N['Weekday'] = pedestrian_df_N['Weekday'].astype(weekday_dtype)
# Group by Weekday and Street
weekday_street_counts = pedestrian_df_N.groupby(['Weekday', 'Street']).size().reset_index(name='Count')
num_categories = weekday_street_counts['Weekday'].nunique()
nrows = (num_categories // 3) + 1
ncols = 3
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(15, 5 * nrows))
axes = axes.flatten()
X = weekday_street_counts.groupby('Weekday', sort=False)
num = 0
for category, group in X:
df = pd.DataFrame(group)
top_5_streets = df.nlargest(5, 'Count')
x_labels = top_5_streets['Street'].values
y_values = top_5_streets['Count'].values
# Plotting
ax = axes[num]
bars = ax.bar(x_labels, y_values, color='#ffdd99')
ax.set_title(f'Top 5 Streets for {category}')
ax.set_xlabel('Street')
ax.set_ylabel('Pedestrian Count')
ax.set_xticks(range(len(x_labels)))
ax.set_xticklabels(x_labels, rotation=90)
total = y_values.sum()
for bar, count in zip(bars, y_values):
height = bar.get_height()
percentage = f'{(count / total * 100):.0f}%'
ax.annotate(percentage, xy=(bar.get_x() + bar.get_width() / 2, height),
xytext=(0, 3), textcoords="offset points", ha='center', va='top')
num += 1
for i in range(num, len(axes)):
axes[i].axis('off')
plt.tight_layout()
plt.show()
C:\Users\chath\AppData\Local\Temp\ipykernel_28860\2397580260.py:11: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
weekday_street_counts = pedestrian_df_N.groupby(['Weekday', 'Street']).size().reset_index(name='Count')
C:\Users\chath\AppData\Local\Temp\ipykernel_28860\2397580260.py:20: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
X = weekday_street_counts.groupby('Weekday', sort=False)
According to the above multiple bar charts the Swanston Street and Flinders Street are the Busiest streets for all days.
# Convert coordinates to numeric
pedestrian_df_N['latitude'] = pd.to_numeric(pedestrian_df_N['latitude'], errors='coerce')
pedestrian_df_N['longitude'] = pd.to_numeric(pedestrian_df_N['longitude'], errors='coerce')
# Filter for 2024 and drop missing values
recent_df = pedestrian_df_N[pedestrian_df_N['Date'].dt.year == 2024].dropna(subset=['latitude', 'longitude'])
# Prepare data for HeatMap
data = recent_df[['latitude', 'longitude', 'pedestriancount']].values.tolist()
# Set map center
map_center = [recent_df['latitude'].mean(), recent_df['longitude'].mean()]
Heat_map = folium.Map(location=map_center, zoom_start=13)
# Add HeatMap layer
HeatMap(data, radius=15, blur=10, max_zoom=1).add_to(Heat_map)
# Add title
title_html = """
<h3 style="text-align: center; margin: 10px 0;">Pedestrian Density Heatmap</h3>
"""
Heat_map.get_root().html.add_child(folium.Element(title_html))
# Display the map
display(Heat_map)